Realtime Viterbi Searching for Practical Telephone Speech Recognition Systems
نویسندگان
چکیده
This paper studies searching and pruning process of the telephone speech recognition system for Private Automatic Branch Exchange (PABX) to explore the possible problems encountered in applying speech recognition to telephone network and to prepare the necessary techniques for the practical telephone speech recognition systems. Experiment on a baseline system which uses semi-syllable based multisubtree decoding structure and a classical Viterbi beam search algorithm achieves 89.86% keyword accuracy rate. By employing the dynamic threshold method, the keyword accuracy can reach 93.48 %. By employing the 'speed up jumping strategy', we achieve a higher performance with 97.35 % in keyword accuracy.
منابع مشابه
Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM
Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...
متن کاملLexical stress modeling for improved speech recognition of spontaneous telephone speech in the jupiter domain
This paper examines an approach of using lexical stress models to improve the speech recognition performance on spontaneous telephone speech. We analyzed the correlation of various pitch, energy, and duration measurements with lexical stress on a large corpus of spontaneous utterances, and identified the most informative features of stress using classification experiments. We incorporated the s...
متن کاملVLSI Architecture of GMM Processing and Viterbi Decoder for 60, 000-Word Real-Time Continuous Speech Recognition
We propose a low-memory-bandwidth, high-efficiency VLSI architecture for 60-k word real-time continuous speech recognition. Our architecture includes a cache architecture using the locality of speech recognition, beam pruning using a dynamic threshold, two-stage language model searching, a parallel Gaussian Mixture Model (GMM) architecture based on the mixture level and frame level, a parallel ...
متن کاملA detection approach to search-space reduction for HMM state alignment in speaker verification
To support speaker verification (SV) in portable devices and in telephone servers with millions of users, a fast algorithm for hidden Markov model (HMM) alignment is necessary. Currently, the most popular algorithm is the Viterbi algorithm with beam search to reduce search-space; however, it is difficult to determine a suitable beam width beforehand. A small beam width may miss the optimal path...
متن کاملLarge vocabulary decoding and confidence estimation using word posterior probabilities
This paper investigates the estimation of word posterior probabilities based on word lattices and presents applications of these posteriors in a large vocabulary speech recognition system. A novel approach to integrating these word posterior probability distributions into a conventional Viterbi decoder is presented. The problem of the robust estimation of confidence scores from word posteriors ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002